Kevin,
I appreciate your involvement in this, and your willingness to dive deeper into this for my (and hopefully others') education
Some of the things you explain about performance were already clear to me (I used to write file systems and a simple operating system in the long past).
Some were not clear, e.g. I have assumed that maybe some of the NS based calls were happening in a kernel space where making calls to the lower APIs would be less wasteful, or that NSURLs, which these calls return would be pooled at a more efficient level (where I only recently learned from Jim L that it was for me using NSString for paths where the fileSystemRep was the key to keeping it faster).
I also agree that some of the properties, such as "NSURLIsVolumeKey" shouldn't be costly because they should not require another call into the BSD/POSIX APIs as I'd have thought that the information is already present at the level above the VFS. And that's especially true for the NSURLIsDirectoryKey.
Architecturally, the performance here should basically be "identical", as the ONLY reason "hasDirectoryPath" can work at all is that the enumeratorAtURL "told" the URL it was a directory at the point it was created
Exactly my thought - I concluded that from logic, without looking at the source code. You're aware that NSURLs append a "/" suffix to dirs, while paths do not, right? So, I suspect that hasDirectoryPath simply checks for that trailing "/", hence it being so fast.
BTW: If you time the "find" tool, you'll see that it can be almost twice as fast as my DirScanner tests. Which surprised me. I still wonder if NSStrings conversions play a role in that, which the find tool doesn't have to worry about.
[quote='830058022, DTS Engineer, /thread/776198?answerId=830058022#830058022']the only explanation I have is that there's a serious issue in our current implementation and the resource cache simply isn't working properly. Please file a bug on this and send me the bug number once it's filed.[/quote]
Huh, well, okay. I was already expecting that anyway, but never bothered to report it as a problem because I didn't think anyone would bother with it. I had reported, for instance, an issue with searchfs(), which has become much much slower in APFS over HFS+ (like, 6 times slower for the same amount of files, when it's an entire startup volume), and this was not addressed, as far as I can see.
And if we're talking about performance issues on the FS, the worst offender is Finder and its support frameworks. I have once run a trace of calls (e.g. with fs_usage) and I found that there's a LOT of repeat calls for the same attributes, which explains why nowadays browsing the Applications folder, especially over a network volume, is so awfully slow (it's mainly about diving into the bundles). THAT needs addressing, and it's so obviously bad - it should be obvious to anyone using a Mac for a few hours, especially if they have some experience how fast it should be (and indeed used to be!). As long as that isn't tackled, I wonder why I should make an effort to report these rather minor bottlenecks. But then again, maybe, if enumeratorAtURL is used by Finder a lot (can't dtrace anymore, can we?) and and this gets improved, maybe it'll automagically also improve browsing in Finder all over, significantly? If you think so, I'm happy to make the effort. (Feel free to email me directly if you have private comments, I don't want to start a flame war here, but rather really care about this getting better, if I can make a difference. Incidentally, I had applied for a job a few times working on FS improvement in the past but nothing come of it.)
[quote='830058022, DTS Engineer, /thread/776198?answerId=830058022#830058022']
my advice would probably be to build and optimize entirely on getattrlistbulk.
[/quote]
Yes, that's indeed on my wish list, after seeing how the find tool can be so much faster (my app has now a mode where it calls the find tool and then my app just looks up the paths it reports back, and that's twice as fast). But then, I also want to improve the UI a lot and add a caching of the FS by tracking all changes and keeping a shadow directory, and that'll benefit my users more in the long run). Only so much a single programmer can accomplish :)
using multiple threads and making multiple calls into getattrlistbulk simultaneously
Yes, that's on my list as well, and yes, especially for network vols. Also, I had already planned this before SSDs became common, and I'd thought to identify which drives where on the same HDD, and only run concurrent searches on separate HDDs, to avoid excessive seeking. Though, I still have to experiment with this, because I'm not sure if the VFS would queue calls on a global level - but in recent years I got hints that this would not be the case, and I'd be able to have multiple FS calls run concurrently if they're on independent file systems, and you seem to indicate the same. Which is promising.
However, most users of my app are simple "home" users who only search on their startup disk, and there's little to parallellize there. I had, for instance, considered to pre-cache the locked system volume, but then, the time for searching that volume is fairly quick compared to the /System/Volumes/Data volume, so caching the former doesn't gain much in the overall search time once the user has accumulated lots of files (which happens eventually).
Again, thank you for not only trying to answer my questions but also exploring solutions.